81 research outputs found

    Cepstral trajectories in linguistic units for text-independent speaker recognition

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-35292-8_3Proceedings of IberSPEECH, held in Madrid (Spain) on 2012.In this paper, the contributions of different linguistic units to the speaker recognition task are explored by means of temporal trajectories of their MFCC features. Inspired by successful work in forensic speaker identification, we extend the approach based on temporal contours of formant frequencies in linguistic units to design a fully automatic system that puts together both forensic and automatic speaker recognition worlds. The combination of MFCC features and unit-dependent trajectories provides a powerful tool to extract individualizing information. At a fine-grained level, we provide a calibrated likelihood ratio per linguistic unit under analysis (extremely useful in applications such as forensics), and at a coarse-grained level, we combine the individual contributions of the different units to obtain a highly discriminative single system. This approach has been tested with NIST SRE 2006 datasets and protocols, consisting of 9,720 trials from 219 male speakers for the 1side-1side English-only task, and development data being extracted from 367 male speakers from 1,808 conversations from NIST SRE 2004 and 2005 datasetsSupported by MEC grant PR-2010-123, MICINN project TEC09-14179, ForBayes project CCG10-UAM/TIC-5792 and Cátedra UAM-Telefónica

    Improved i-Vector Representation for Speaker Diarization

    Get PDF
    This paper proposes using a previously well-trained deep neural network (DNN) to enhance the i-vector representation used for speaker diarization. In effect, we replace the Gaussian Mixture Model (GMM) typically used to train a Universal Background Model (UBM), with a DNN that has been trained using a different large scale dataset. To train the T-matrix we use a supervised UBM obtained from the DNN using filterbank input features to calculate the posterior information, and then MFCC features to train the UBM instead of a traditional unsupervised UBM derived from single features. Next we jointly use DNN and MFCC features to calculate the zeroth and first order Baum-Welch statistics for training an extractor from which we obtain the i-vector. The system will be shown to achieve a significant improvement on the NIST 2008 speaker recognition evaluation (SRE) telephone data task compared to state-of-the-art approaches

    Sesquiterpenes from aerial parts of Ferula vesceritensis

    Get PDF
    From the dichloromethane extract of aerial parts of Ferula vesceritensis (Apiaceae), 11 sesquiterpene derivatives were isolated. Among them five were compounds designated as 10-hydroxylancerodiol-6-anisate, 2,10-diacetyl-8-hydroxyferutriol-6-anisate, 10-hydroxylancerodiol-6-benzoate, vesceritenone and epoxy-vesceritenol. The six known compounds were identified as feselol, farnesiferol A, lapidol, 2-acetyl-jaeschkeanadiol-6-anisate, lasidiol-10-anisate and 10-oxo-jaesckeanadiol-6-anisate. All the structures were determined by extensive spectroscopic studies including 1D and 2D NMR experiments and mass spectroscopy analysis. Two of the compounds, the sesquiterpene coumarins farnesiferol A and feselol, bound to the model recombinant nucleotide-binding site of an MDR-like efflux pump from the enteropathogenic protozoan Cryptosporidium parvum

    NeuroSpeech

    Get PDF
    NeuroSpeech is a software for modeling pathological speech signals considering different speech dimensions: phonation, articulation, prosody, and intelligibility. Although it was developed to model dysarthric speech signals from Parkinson's patients, its structure allows other computer scientists or developers to include other pathologies and/or measures. Different tasks can be performed: (1) modeling of the signals considering the aforementioned speech dimensions, (2) automatic discrimination of Parkinson's vs. non-Parkinson's, and (3) prediction of the neurological state according to the Unified Parkinson's Disease Rating Scale (UPDRS) score. The prediction of the dysarthria level according to the Frenchay Dysarthria Assessment scale is also provided

    Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

    Full text link
    Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain

    Improved i-Vector Representation for Speaker Diarization

    Get PDF

    An investigation of supervector regression for forensic voice comparison on small data

    Get PDF
    International audienceThe present paper deals with an observer design for a nonlinear lateral vehicle model. The nonlinear model is represented by an exact Takagi-Sugeno (TS) model via the sector nonlinearity transformation. A proportional multiple integral observer (PMIO) based on the TS model is designed to estimate simultaneously the state vector and the unknown input (road curvature). The convergence conditions of the estimation error are expressed under LMI formulation using the Lyapunov theory which guaranties bounded error. Simulations are carried out and experimental results are provided to illustrate the proposed observer
    • …
    corecore